Pareto Front Identification from Stochastic Bandit Feedback
نویسندگان
چکیده
We consider the problem of identifying the Pareto front for multiple objectives from a finite set of operating points. Sampling an operating point gives a random vector where each coordinate corresponds to the value of one of the objectives. The Pareto front is the set of operating points that are not dominated by any other operating point in respect to all objectives (considering the mean of their objective values). We propose a confidence bound algorithm to approximate the Pareto front, and prove problem specific lower and upper bounds, showing that the sample complexity is characterized by some natural geometric properties of the operating points. Experiments confirm the reliability of our algorithm. For the problem of finding a sparse cover of the Pareto front, we propose an asymmetric covering algorithm of independent interest.
منابع مشابه
Multi-Objective X -Armed Bandits
Many of the standard optimization algorithms focus on optimizing a single, scalar feedback signal. However, real-life optimization problems often require a simultaneous optimization of more than one objective. In this paper, we propose a multi-objective extension to the standard X -armed bandit problem. As the feedback signal is now vector-valued, the goal of the agent is to sample actions in t...
متن کاملMulti-objective Contextual Bandit Problem with Similarity Information
In this paper we propose the multi-objective contextual bandit problem with similarity information. This problem extends the classical contextual bandit problem with similarity information by introducing multiple and possibly conflicting objectives. Since the best arm in each objective can be different given the context, learning the best arm based on a single objective can jeopardize the rewar...
متن کاملThe Pareto Regret Frontier for Bandits
Given a multi-armed bandit problem it may be desirable to achieve a smallerthan-usual worst-case regret for some special actions. I show that the price for such unbalanced worst-case regret guarantees is rather high. Specifically, if an algorithm enjoys a worst-case regret of B with respect to some action, then there must exist another action for which the worst-case regret is at least Ω(nK/B),...
متن کاملImproving the Pareto UCB1 Algorithm on the Multi-Objective Multi-Armed Bandit
In this work, we introduce a straightforward approach for bounding the regret of Multi-Objective Multi-Armed Bandit (MO-MAB) heuristics extended from standard bandit algorithms. The proposed methodology allows us to easily build upon the regret analysis of the heuristics in the standard bandit setting. Using our approach, we improve the Pareto UCB1 algorithm, that is the multi-objective extensi...
متن کاملEfficient Online Learning under Ban- dit Feedback
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm and extend these results to bandits with arbitrary structure that is known to the decision maker. In these settings, we derive problem specific regret lower bounds and propose both an asymp...
متن کامل